For a linear transformation t:V→V, the domain and codomain is the same, so we can compose t with itself: t2=t∘t and t3=t∘t∘t. In general, ti+j=ti∘tj
Note that the exponent notation matches that of matrices, with RepB,B(tj)=Tj
Let t be a linear transformation on V. Where f(x)=cnxn+⋯+c1x+c0 is a polynomial, f(t)=cntn+⋯+c1t+c0id is a transformation on V. Similarly, if T is a square matrix then f(T)=cnTn+⋯+c1T+c0I
We must prove the following:
Let f(x)=cnxn+⋯+c1x+c0 and g(x)=dmxm+⋯+d1x+c0 be two polynomials and t be a linear transformation on V. If s(x)=f(x)+g(x) is the sum of f(x) and g(x) and p(x)=f(x)g(x) is the product, then s(t)=f(t)+g(t) and p(t)=f(t)∘g(t)
Proof
The addition case is simple. To prove the product, for some v∈V write f(t)∘g(t)(v)=f(t)(g(t)(v))=(cntn+⋯+c1t+c0idV)(g(t)(v))=cntn(g(t)(v))+⋯+c1t(g(t)(v))+c0(g(t)(v))
From g(x) and the fact that ti is a linear map, citi(g(t)(v))==citi(dmtm(v)+⋯+d1t(v)+d0(v))cidm(ti+m(v))+cid1(t1+i(v))+cid0(ti(v))
This is simply pi(t), where pi(x)=cidmxi+m+⋯+cid1xi+1+cid0xi=cixi(dmxm+⋯+d1x+d0)=cixig(x)
Thus, from above, f(t)∘g(t)(v)=pn(t)(v)+⋯+p1(t)(v)+p0(t)(v)
From the sum case, we obtain f(t)∘g(t)(v)=S(t)(v) where S(x) is f(t)∘g(t)(v)====cnxng(x)+⋯+c1xg(x)+c0g(x)(cnxn+⋯+c1x+c0)g(x)f(x)g(x)p(x)
Hence, f(t)∘g(t)(v)=p(t)(v) for all v∈V, proving the result.
Thus, we have f(t)∘g(t)=g(t)∘f(t). Furthermore, the range space R(f(t)) and null space N(f(t)) are stable (or invariant) by g(t), i.e. g(t)(R(f(t)))⊆R(f(t)) and g(t)(N(f(t)))⊆N(f(t))
Proof
If v∈R(f(t)), then we can find a w∈V such that v=f(t)(w). Hence, g(t)(v)=g(t)(f(t)(w))=f(t)(g(t)(w))∈R(f(t)).
Similarly, if v∈N(f(t)), then f(t)(g(t)(v))=g(t)(f(t)(v))=g(t)(0)=0.
Let t be a linear transformation on vector space V. Suppose v is an eigenvector associated with eigenvalue λ of t. For every polynomial f(x), v is an eigenvector associated with eigenvalue f(λ) of f(t)
Proof
We have ti(v)=λiv. Therefore, if f(x)=cnxn+⋯+c1x+c0, then f(t)(v)=cntn(v)+⋯+c1t(v)+c0=cnλn(v)+⋯+c1λ(v)+c0=f(λ)(v)
Minimal Polynomial
We now aim to study the set of polynomials p(x) such that for a given linear transformation t:v→V (or square matrix T), p(t)=0, the zero map (or p(T)=0, the zero matrix).
If T is a square matrix, we can find a non-zero polynomial p(x) such that p(T)=0. Similarly, if t:V→V is a linear transformation, we can find a non-zero polynomial p(x) such that p(t)=0.
Proof
It suffices to prove the matrix case.
Since the M2×2 vector space has dimension n2, the n2+1 member set {I,T,T2,...,Tn2} must be linearly dependent. Thus, there exists scalars c0,...,cn2 not all zero such that cn2Tn2+⋯+c1T+c0I=0. Hence, the polynomial p(x)=cn2xn2+⋯+c1x+c0 is a non-zero polynomial for which p(T)=0.
From the proof above, we see that a polynomial of degree n2 always suffices, but sometimes a smaller-degree polynomial is enough.
Example 16.1
Consider the matrix T=(cos(π/3)sin(π/3)−sin(π/3)cos(π/3))=(1/23/2−3/21/2)
The polynomial p(x)=x2−x+1=0 satisfies p(T)=0
If f(x) is a polynomial that takes a matrix T (or linear map t) to zero, then every eigenvalue of T (or linear map t) is a root of f(x).
Proof
Let v be an eigenvector associated with eigenvalue λ of t. Then f(t)(v)=f(λ)(v). Since f(t)=0, we have f(λ)(v)=0⟹f(λ)=0 because v=0 as eigenvectors are not 0
The minimal polynomialm(x) of a transformation t or square matrix T is the non-zero polynomial of least degree and leading coefficient 1 such that m(t)=0 or m(T)=0.
Since the leading coefficient must be 1, m(x) cannot be a constant; i.e. it must have degree at least one.
Example 16.2
The zero matrix has minimal polynomial m(x)=x.
The identity matrix has minimal polynomial m(x)=x−1
Any transformation/square matrix has a unique minimal polynomial.
Proof
First, we prove existence.
Since a polynomial exists that takes a map or polynomial to zero, we take the one with smallest degree. Divide by its leading coefficient to make it 1. This satisfies the conditions, so a minimal polynomial exists.
Next, we prove uniqueness.
Suppose both m(x) and m^(x) are two polynomials that take the map or matrix to zero, are both minimal and equal degree, and have leading coefficient 1. Consider the difference m(x)−m^(x). If this is not zero, then it has a nonzero leading coefficient. Dividing by that coefficient gives a polynomial that takes a map or matrix to zero, has leading coefficient 1, and is smaller degree than m and m^. This contradicts the minimality of m and m^, thus m(x)−m^(x)=0, making the two equal.
We strengthen the earlier claim that the eigenvalues are roots of f(x).
The roots of the minimal polynomial are exactly the eigenvalues of a square matrix (or linear map)
Proof
Let m(x) be the minimal polynomial of the linear map t:V→V. Since m(t)=0, all eigenvalues are roots of m(x). It remains to show that every root of m(x) is an eigenvalue of t. If r is a root of m(x), then we can write m(x)=(x−r)p(x) for a polynomial p(x) with smaller degree than m(x). Since m(x) is the minimal polynomial of t, p(t) is not the zero map. Therefore, there is a vector v∈V such that p(t)(v)=0. Using this, we have m(t)(v)=(t−r idV)(p(x)(v)). Using m(t)=0 and distributing, t(p(t)(v))−rp(t)(v)=0⟹t(p(t)(v))=rp(t)(v) which means r is an eigenvalue of t with eigenvector p(t)(v)
For a polynomial f(x), if f(T) is the zero matrix then f(x) is divisible by the minimal polynomial of T.
Proof
Let m(x) be the minimal polynomial of T. Then according to the division theorem for polynomials, f(x)=q(x)⋅m(x)+r(x), with r(x) having degree strictly less than m(x). Since T satisfies both f and m, m(T)=f(T)=0, so r(T) must also be the zero map. This would contradict the minimality of m unless r was the zero polynomial.
Cayley-Hamilton Theorem
If T is a square matrix (or t a linear map) with characteristic polynomial c(x), then c(T) is the zero matrix (or c(t) the zero map). In particular, the minimal polynomial m(x) of T divides its characteristic poylnomial.
Proof
Let C=T−xI, the matrix whose determinant is the characteristic polynomial c(x)=cnxn+⋯+c1x+c0. Since the product of a matrix and its adjoint is its determinant times the identity, we can write c(x)⋅I=adj(C)C=adj(C)(T−xI)=adj(C)T−adj(C)⋅x
The left side is cnIxn+⋯+c1Ix+c0I. For the right, adj(C) is a matrix of polynomials, therefore it can be expressed as a polynomial with matrix coefficients: adj(C)=Cn−1xn−1+⋯+C1x+C0, with each Ci being a matrix of scalars, making the right side [(Cn−1T)xn−1+⋯+(C1T)x+C0T]−[Cn−1xn+⋯+C1x2+C0x]
Equate the coefficients on each side: cnI=−Cn−1cn−1I=Cn−1T−Cn−2⋮c1I=C1T−C0c0I=C0T
Multilply the first equation by Tn, the second equation by Tn−1, etc. cnTn=−Cn−1Tncn−1Tn−1=Cn−1Tn−Cn−2Tn−1⋮c1T=C1T2−C0Tc0I=C0T
Adding all the equations, many cancel out, giving cnTn+cn−1Tn−1+⋯+c0I=c(T)=0
Thus, if λ1,λ2,...,λl are eigenvalues of a linear transformation of an n-dimensional vector or square n×n matrix, then the characteristic polynomial factors to c(x)=(−1)n(x−λ1)p1(x−λ2)p2⋯(x−λl)pl
and its minimal polynomial factors into m(x)=(x−λ1)q1(x−λ2)q2⋯(x−λl)ql
where 1≤qi≤pi for each i∈{1,...,l}
Example 16.3
Find the minimal polynomial of the following matrix T=⎝⎜⎜⎜⎛21000200002012−11⎠⎟⎟⎟⎞
The characteristic polynomial is easily computed as c(x)=(x−1)(x−2)3. Thus, by the Cayley-Hamilton Theorem, the minimal polynomial is either (x−1)(x−2), (x−1)(x−2)2, or (x−1)(x−2)3. We compute to check: (T−I)(T−2I)=⎝⎜⎜⎜⎛11000100001012−10⎠⎟⎟⎟⎞⎝⎜⎜⎜⎛01000000000012−1−1⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛0100000000000100⎠⎟⎟⎟⎞ (T−I)(T−2I)2=⎝⎜⎜⎜⎛0100000000000100⎠⎟⎟⎟⎞⎝⎜⎜⎜⎛01000000000012−1−1⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛0000000000000000⎠⎟⎟⎟⎞
So m(x)=(x−1)(x−2)2
A square matrix of a linear map is diagonalizable iff its minimal polynomial m(x) has simple roots, i.e. m(x)=(x−λ1)(x−λ2)⋯(x−λl) where each λi is distinct.
Proof
Call λ1,...,λl the distinct roots of the characteristic polynomial of the linear map t on the n-dimensional vector space V. Note that λ1,...,λl are the distinct roots of the minimal polynomial m(x) of t.
Suppose t is diagonalizable. Then there exists a basis B=⟨v1,...,vn⟩ with each vi an eigenvector of t. Consider the polynomial p(x)=(x−λ1)⋯(x−λl). We claim p(t)=0. It suffices to check p(t)(vi)=0 for all vi. We know p(t)(vi)=p(λi)(vi), but p(λi)=(λi−λ1)⋯(λi−λl)=0 since λi∈{λ1,...,λl}, thus p(t)(vi)=0. Thus, the minimal polynomial m(x) must divide p(x). Therefore, m(x) also has simple roots, and is in fact equal to p(x) since λ1,...,λl are the distinct roots of m(x). Thus, t is diagonalizable ⟹m(x) has simple roots.
To prove the converse, we perform induction on the number of eigenvalues l, which is the dimension of m(x).
If T has only one eigenvalue, then m(x)=x−λ1⟹m(t)=t−λ1 idV=0⟹t=λ1 id meaning it is diagonal.
Let m(x)=(x−λ1)⋯(x−λl). Let p(x)=(x−λ2)⋯(x−λl) so that m(x)=(x−λ1)p(x). Suppose the statement is true if the minimal polynomial has degree less than l. Call V^=N(p(t)). Since t(V^)⊆V^, we can define a linear map t^:V^→V^ by t^(v)=t(v) for v∈V^. Therefore, if m^(x) is the minimal polynomial for t^, m^(x) divides p(x), making the roots simple. Since there are at most l−1 roots thats are all simple, we can find a basis ⟨w1,...,wk⟩ of V^ consisting of eigenvectors wj of t^ (hence of t) with associated eigenvalues μj∈{λ2,...,λl}.
Since V^=N(p(t)) has k vectors in the basis, the nullity of p(t) is k. Hence, k+rank(p(t))=n, the dimension of V.
Next, we show that R(p(t))⊆N(t−λ1 idV). If v∈R(p(t)), we can find a w∈V such that v=p(t)(w)⟹(t−λ1idV)(v)=(t−λ1idV)(p(t)(w)), but since (x−λ1)p(x)=m(x), (t−λ1idV)(v)=m(t)(w)=0, completing the proof.
If m was the nullity of (t−λ1 idV), we must have m+k≥rank(p(t))+k=n. Pick a basis ⟨v1,...,vm⟩. Each vi is an eigenvector of t associated with eigenvalue λ1.
We claim ⟨v1,...,vm,w1,...,wk⟩ is a basis of V. Since m+k≥n, it suffices to show these vectors are linearly independent. Suppose we can find scalars α1,...,αm,β1,...,βk such that α1v1+⋯+αmvm+β1w1+⋯+βkwk=0
Applying the map (t−λ1idV) to the above equation and considering that (t−λ1 idV)(vi)=0 and (t−λ1 idV)(wj)=(μj−λ1)wj, we simplify to β1(μ1−λ1)w1+⋯+βk(μk−λ1)wk=0
Since ⟨w1,...,wk⟩ is a basis of V, this implies βj(μj−λ1)=0 for all j=1,...,k. But μj−λ1=0 since μj∈{λ2,...,λl} and all the λi's are distinct. Therefore, all of the βj=0, reducing the linear dependence relation to α1v1+⋯+αmvm=0
Since ⟨v1,...,vm⟩ is a basis for N(t−λ1 idV), we conclude αi=0 for all i=1,...,m. Therefore, ⟨v1,...,vm,w1,...,wk is a basis of V consisting of eigenvectors of t. Hence, t is diagonalizable.
Powers of Transformations
For transformations t:V→V we can consider powers, as mentioned at the beginning of this chapter.
Example 16.4
The derivative map d/dx:P3→P3, we have a+bx+cx2+dx3d/dxb+2cx+3dx2 a+bx+cx2+dx3d2/dx22c+6dx a+bx+cx2+dx3d3/dx36d
and any higher power maps to 0.
For any transformation V:t→t, the range spaces of the powers form a descending chain: V⊇R(t)⊇R(t2)⊇⋯
and the null spaces form an ascending chain: {0}⊆N(t)⊆N(t2)⊆⋯
Further, there is a k>0 such that for powers less than k the subsets are proper; i.e. if j<k then R(tj)⊃R(tj+1) and N(tj)⊂N(tj+1), while is j≥k then R(tj)=R(tj+1) and N(tj)=N(tj+1)
(k=1 can happen if, for example, t is invertible, as none of the subsets in the chain would be a proper subset.)
Proof
If w∈R(tj+1), so that w=tj+1(v) for some v, then w=tj(t(v)), so w∈R(tj). So R(tj+1)⊆R(tj). If R(tk)=R(tk+1), then R(tk+1)=R(tk+2) by induction, since we have the same map with the same domain and same range. Also, because V is finite-dimensional and proper subsets must have dimension strictly less than its superset, it must eventually stop.
Finally, by rank-nullity theorem, the opposite is true of the nullspace.
Example 16.5
For the derivative map d/dx:P3→P3 from before, we have the following chain of range spaces: P3⊃P2⊃P1⊃P0⊃{0}
and the following chain of null spaces: {0}⊂P0⊂P1⊂P2⊂P3
Let t be a transformation on an n-dimensional space.
The generalized range space (or closure of the range space) is R∞(t)=R(tn).
The generalized null space (or closure of the null space) is N∞(t)=N(tn)
Essentially, this is the point where the range space and null space stabilize, which is at most after n iterations.
As j increases, the dimensions of R(tj)s fall while the dimensions of the N(tj)s rise, so that V is split among them.
For any linear t:V→V, the function t:R∞(t)→R∞(t) is bijective. Therefore R∞(t)∩N∞(t)={0}
Proof
Let the dimension of V be n. Because R(tn)=R(tn+1)=t(R(tn)), the map t:R∞(t)→R∞(t) is onto. Therefore, it is one-to-one.
Next, assume v∈R∞(t)∩N∞(t). Since v is in the generalized null space, tn(v)=0. On the other hand, since t is one-to-one and a composition of one-to-one functions is also one-to-one, tn is also one-to-one. Only 0 maps to 0 in a one-to-one linear map so tn(v)=0 implies v=0.
As a result, the union R∞(t)∪N∞(t) spans V. In fact, if ⟨v1,...,vk⟩ is a basis of R∞(t) and ⟨w1,...,wl⟩ is a basis of N∞(t), then ⟨v1,...,vk,w1,...,wl⟩ is a basis of V. Moreover the block diagonal matrix of t is
(SZZN)
where Z is zeroes, S∈Mk×k is invertible and N∈Ml×l satisfies Nl=0Proof
Let the dimension of V be n. By rank-nullity, k+l=n, so B=⟨v1,...,vk,w1,...,wl⟩ has n-vectors. To show B is a basis of V, it suffices to show that this set is linearly independent. But this follows from the fact that R∞(t)∩N∞(t)={0}
For powers between j=0 and j=n, the intersection between the range space and null space may be nontrivial.
Consider the shift mapn:C2→C2 defined by (xy)↦(0x)
On the basis, this map's action gives a string (10)↦(01)↦(00)that ise1↦e2↦0
Notice how e2=(01) is both in the range space and null space. Also observe that though n is not the zero map, the functio n2=n∘n is the zero map. This will be explored next.